privacy amplification
Privacy Amplification by Mixing and Diffusion Mechanisms
Borja Balle, Gilles Barthe, Marco Gaboardi, Joseph Geumlek
A fundamental result in differential privacy states that the privacy guarantees of a mechanism are preserved by any post-processing of its output. In this paper we investigate under what conditions stochastic post-processing can amplify the privacy of a mechanism. By interpreting post-processing as the application of a Markov operator, we first give a series of amplification results in terms of uniform mixing properties of the Markov process defined by said operator. Next we provide amplification bounds in terms of coupling arguments which can be applied in cases where uniform mixing is not available. Finally, we introduce a new family of mechanisms based on diffusion processes which are closed under post-processing, and analyze their privacy via a novel heat flow argument. On the applied side, we generalize the analysis of "privacy amplification by iteration" in Noisy SGD and show it admits an exponential improvement in the strongly convex case, and study a mechanism based on the Ornstein-Uhlenbeck diffusion process which contains the Gaussian mechanism with optimal post-processing on bounded inputs as a special case.
- North America > United States > California > Santa Barbara County > Santa Barbara (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- (3 more...)
Privacy Amplification by Subsampling: Tight Analyses via Couplings and Divergences
Borja Balle, Gilles Barthe, Marco Gaboardi
Differential privacy comes equipped with multiple analytical tools for the design of private data analyses. One important tool is the so-called "privacy amplification by subsampling" principle, which ensures that a differentially private mechanism run on a random subsample of a population provides higher privacy guarantees than when run on the entire population. Several instances of this principle have been studied for different random subsampling methods, each with an ad-hoc analysis. In this paper we present a general method that recovers and improves prior analyses, yields lower bounds and derives new instances of privacy amplification by subsampling. Our method leverages a characterization of differential privacy as a divergence which emerged in the program verification community. Furthermore, it introduces new tools, including advanced joint convexity and privacy profiles, which might be of independent interest.
- North America > Canada > Quebec > Montreal (0.14)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- (2 more...)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
- North America > United States > Florida > Broward County > Fort Lauderdale (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- (3 more...)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
18561617ca0b4ffa293166b3186e04b0-Paper-Conference.pdf
However, foundational theoretical questions about this algorithm's privacy loss remain open--even in the seemingly simple setting of smooth convex losses over a bounded domain. Our main result resolves these questions: for a large range of parameters, we characterize the differential privacy up to a constant.
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Asia > Singapore (0.04)
- North America > United States > California (0.04)
Privacy Amplification by Missing Data
Roburin, Simon, Pinot, Rafaël, Scornet, Erwan
Privacy preservation is a fundamental requirement in many high-stakes domains such as medicine and finance, where sensitive personal data must be analyzed without compromising individual confidentiality. At the same time, these applications often involve datasets with missing values due to non-response, data corruption, or deliberate anonymization. Missing data is traditionally viewed as a limitation because it reduces the information available to analysts and can degrade model performance. In this work, we take an alternative perspective and study missing data from a privacy preservation standpoint. Intuitively, when features are missing, less information is revealed about individuals, suggesting that missingness could inherently enhance privacy. We formalize this intuition by analyzing missing data as a privacy amplification mechanism within the framework of differential privacy. We show, for the first time, that incomplete data can yield privacy amplification for differentially private algorithms.
- North America (0.14)
- Europe > France > Île-de-France > Paris > Paris (0.04)
- Asia > South Korea > Seoul > Seoul (0.04)
- Asia > Middle East > Jordan (0.04)
Privacy Amplification via Compression: Achieving the Optimal Privacy-Accuracy-Communication Trade-off in Distributed Mean Estimation
Privacy and communication constraints are two major bottlenecks in federated learning (FL) and analytics (FA). We study the optimal accuracy of mean and frequency estimation (canonical models for FL and FA respectively) under joint communication and $(\varepsilon, \delta)$-differential privacy (DP) constraints. We consider both the central and the multi-message shuffled DP models. We show that in order to achieve the optimal $\ell_2$ error under $(\varepsilon, \delta)$-DP, it is sufficient for each client to send $\Theta\left( n \min\left(\varepsilon, \varepsilon^2\right)\right)$ bits for FL %{\color{blue}(assuming the dimension $d \gg n \min\left(\varepsilon, \varepsilon^2\right)$)} and $\Theta\left(\log\left( n\min\left(\varepsilon, \varepsilon^2\right) \right)\right)$ bits for FA to the server, where $n$ is the number of participating clients. Without compression, each client needs $O(d)$ bits and $O\left(\log d\right)$ bits for the mean and frequency estimation problems respectively (where $d$ corresponds to the number of trainable parameters in FL or the domain size in FA), meaning that we can get significant savings in the regime $ n \min\left(\varepsilon, \varepsilon^2\right) = o(d)$, which is often the relevant regime in practice.
Privacy Amplification via Random Check-Ins
Differentially Private Stochastic Gradient Descent (DP-SGD) forms a fundamental building block in many applications for learning over sensitive data. Two standard approaches, privacy amplification by subsampling, and privacy amplification by shuffling, permit adding lower noise in DP-SGD than via na\{\i}ve schemes. A key assumption in both these approaches is that the elements in the data set can be uniformly sampled, or be uniformly permuted --- constraints that may become prohibitive when the data is processed in a decentralized or distributed fashion. In this paper, we focus on conducting iterative methods like DP-SGD in the setting of federated learning (FL) wherein the data is distributed among many devices (clients). Our main contribution is the \emph{random check-in} distributed protocol, which crucially relies only on randomized participation decisions made locally and independently by each client. It has privacy/accuracy trade-offs similar to privacy amplification by subsampling/shuffling. However, our method does not require server-initiated communication, or even knowledge of the population size. To our knowledge, this is the first privacy amplification tailored for a distributed learning framework, and it may have broader applicability beyond FL. Along the way, we improve the privacy guarantees of amplification by shuffling and show that, in practical regimes, this improvement allows for similar privacy and utility using data from an order of magnitude fewer users.